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Abstract —Conventional speaker localization algorithms, based 
merely on the received microphone signals, are often sensitive 
to adverse conditions, such as: high reverberation or low signal 
to noise ratio (SNR). In some scenarios, e.g. in meeting rooms 
or cars, it can be assumed that the source position is con¬ 
fined to a predefined area, and the acoustic parameters of the 
environment are approximately fixed. Such scenarios give rise 
to the assumption that the acoustic samples from the region 
of interest have a distinct geometrical structure. In this paper, 
we show that the high dimensional acoustic samples indeed lie 
on a low dimensional manifold and can be embedded into a 
low dimensional space. Motivated by this result, we propose 
a semi-supervised source localization algorithm which recovers 
the inverse mapping between the acoustic samples and their 
corresponding locations. The idea is to use an optimization frame¬ 
work based on manifold regularization, that involves smoothness 
constraints of possible solutions with respect to the manifold. 
The proposed algorithm, termed Manifold Regularization for 
Localization (MRL), is implemented in an adaptive manner. 
The initialization is conducted with only few labelled samples 
attached with their respective source locations, and then the 
system is gradually adapted as new unlabelled samples (with 
unknown source locations) are received. Experimental results 
show superior localization performance when compared with 
a recently presented algorithm based on a manifold learning 
approach and with the generalized cross-correlation (GCC) 
algorithm as a baseline. 

Index Terms —sound source localization, relative transfer func¬ 
tion (RTF), manifold regularization, reproducing kernel Hilbert 
space (RKHS), diffusion distance. 


I. Introduction and Motivation 

The problem of source localization has attracted the at¬ 
tention of many researchers during the last decades. Vari¬ 
ous applications rely on the recovery of the spatial position 
of an emitting source, such as: automated camera steering, 
teleconferencing and beamformer steering for robust speech 
recognition. For this reason, considerable amount of efforts 
have been devoted to investigate this field and a wide range 
of methods have been proposed over the years. Common 
to all localization approaches is the utilization of multiple 
microphone recordings to infer the spatial information. The 
fundamental challenge is to attain robust localization in poor 
conditions, i.e., in the presence of high reverberation and 
background noises. 
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Conventional localization approaches can be roughly di¬ 
vided into two main categories: single- and dual-step ap¬ 
proaches. In the first class of algorithms, the source location 
is determined directly from the microphone signals. The most 
dominant member of this class is the maximum likelihood 
(ML) algorithm. The algorithm is derived by applying the 
ML criterion to a chosen statistical model of the received 
signals. This optimization often involves maximization of 
the output power of a beamformer, steered to all potential 
source locations CO, El, 0. Another type of single-stage 
approaches is high resolution spectral estimation methods, 
such as the well-known multiple signal classification (MUSIC) 
algorithm 0), and the estimation of signal parameters via 
rotational invariance (ESPRIT) techniques 0. 

In the dual-step approaches category, the first stage in¬ 
volves time difference of arrival (TDOA) estimation from 
spatially separated microphone pairs. The classical method for 
TDOA estimation is the generalized cross-correlation (GCC) 
algorithm introduced in the landmark paper by Knapp and 
Carter m. The GCC method relies on the assumption of a 
reverberant-free model such that the acoustic transfer function 
(ATF), which relates the source and each of the microphones, 
is a pure delay. However, this assumption does not hold in 
the presence of room reverberation, rendering a performance 
deterioration 0. Consequently, improvements of the GCC 
method for the reverberant case were proposed 0, 0, ED. 

In the second algorithmic stage, the noisy TDOA esti¬ 
mates are combined to carry out the actual localization. Each 
TDOA estimate is associated with an infinite set of source 
positions, lying on a half of an hyperboloid. The locus of 
the speaker can be recovered by intersecting the hyperboloid 
surfaces corresponding to the measurements of different pairs 
of microphones. However, the computation of a 3-dimensional 
hyperboloids intersection is a cumbersome task and tends to 
be sensitive to TDOA estimation errors. In far-field regime 
the hyperboloid can be approximated by a cone, and linear 
intersection estimate can be applied m. Another simplifying 
approach is to recast the hyperbolic equations into a spherical 
form, and apply the nonlinear least squares approach m . 

All the prementioned methods utilize the spatial information 
conveyed by the received signals, but do not rely on any prior 
information about the enclosure in which the measurements 
are obtained. In some scenarios, e.g. in meeting rooms or 
cars, the source position is confined to a predefined area. It 
is reasonable to assume that representative samples from the 
region of interest can be measured in advance. Examining 
the structures and patterns characterizing the representative 
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samples can be utilized for formulating a data-driven model 
which relates the measured signals to their corresponding 
source positions. The additional information may help to better 
cope with the challenges posed by reverberation and noise. 
So far, only few attempts were made to involve training 
information for performing source localization. 

Deleforge and Horaud in EG), discussed a 2-D sound local¬ 
ization scheme, in the binaural hearing context. Their central 
assumption is that the binaural observations lie on an intrinsic 
manifold which is locally linear. Accordingly, they proposed 
a probabilistic piecewise affine regression model, that learns 
the localization-to-interaural mapping and its inverse. In ED, 
ca. the authors have generalized the algorithm to deal with 
multiple sources using variational Expectation Maximization 
(EM) framework. 

In (m the task of direction of arrival (DOA) estimation was 
formulated as a classification problem and a learning-based 
approach was presented. They proposed to extract features 
from the GCC vectors and use a multilayer perceptron neural 
network to learn the nonlinear mapping from such features to 
the DOA. 

Talmon et al. El introduced a supervised method based 
on manifold learning , using diffusion kernels. The main idea 
is specifying the fundamental controlling parameters of the 
acoustic impulse response (AIR) using a manifold learning 
scheme. Assuming that the position of the source is the only 
varying degree-of-freedom of the system at hand, this process 
is capable of recovering the unknown source locations. The 
key point of the algorithm is to use an appropriate diffusion 
kernel with a specifically-tailored distance measure, that is 
capable of finding the underlying independent parameters, 
dominating the system. Talmon et al. El have applied this 
method to a single microphone system with a white Gaussian 
noise (WGN) input. 

In 119] we adopted the paradigm of lfl8l and adapted it to 
a more realistic setting where the source is a speech signal 
rather than a WGN signal. The power spectral density of the 
speech signal is non-flat (as well as non-stationary). Hence, 
the spectral variations may blur the variations attributed to 
the different possible locations of the source. In order to 
mitigate this problem, we committed two major changes in 
the algorithm presented in El: 1) a second microphone was 
added and 2) the feature vector, that was originally based 
on the correlation function has been replaced by a power 
spectral density (PSD)-based vector. It should be emphasized 
that in (18] the feature vector was associated with the AIR, 
whereas in El the feature vector relied on the relative transfer 
function (RTF) which is the Fourier transform of the relative 
impulse response. 

Though localization algorithms based on the diffusion 
framework were shown to perform well, their fundamental 
drawback is that they do not provide any guarantee for 
optimality. In general the diffusion-based methods are imple¬ 
mented by a dual-stage approach. First, a low dimensional 
embedding of the representative samples is recovered in an 
unsupervised manner. Second, the new representation is used 
to estimate the unknown locations based on the labelled 
samples. The septation into two stages where one is entirely 


unsupervised and the other is entirely supervised is not neces¬ 
sarily optimal. Moreover, the unlabelled data are not exploited 
for the estimation itself. 

The significance of combining both labelled and unlabelled 
data, in the source localization context, should be farther 
emphasized. Classification and regression algorithms which 
rely on training data, are very popular in various applications, 
such as: text categorization, handwriting recognition, images 
classification and speech recognition. Nowadays, there exist a 
rich database for each of these tasks, with considerable amount 
of examples with true labellings. Thus, these problems are 
more usefully solved using fully supervised approaches. On 
the contrary, in the localization problem the training should 
fit to the specific acoustic environment in which the measure¬ 
ments are obtained, thus, we cannot create a general database 
that corresponds to all possible acoustic scenarios. Instead, the 
training set should be generated individually for each acoustic 
environment. To obtain labelled data, one needs to generate 
recordings in a controlled manner and calibrate each of them 
precisely. Generating a large amount of labelled data is a cum¬ 
bersome and impractical process. However, unlabelled data is 
freely available since it can be collected whenever someone is 
speaking. This greatly motivates the use of semi-supervised 
approaches, which mostly rely on unlabelled data, for the 
source localization problem. Another motivation is related to 
the special characteristics of the acoustic environment. As will 
be further elaborated in the paper, the unlabelled data can 
be utilized for forming a data-driven model of the acoustic 
environment that is very useful for performing robust source 
localization. 

To address the limitations of the previous diffusion-based 
approaches, and to better utilize the unlabelled data, we 
propose the Manifold Regularization for Focalization (MRF) 
algorithm. The method recovers the inverse mapping between 
the acoustic samples and their corresponding locations. The 
gist of the algorithm is based on the concepts of manifold 
regularization on a reproducing kernel Hilbert space (RKHS), 
introduced by Belkin et al. (20). The idea is to extended the 
standard supervised estimation framework by adding an extra 
regularization term which imposes a smoothness constraint on 
possible solutions with respect to a data-driven model. The 
model is learned empirically by forming a data adjacency 
graph over both labelled and unlabelled training samples. In 
this approach, the estimated location relies not only on the 
labelled samples, but also on the unlabelled ones. Moreover, in 
order to efficiently utilize unlabelled samples received during 
runtime, we propose an adaptive implementation. The MRF 
algorithm iteratively updates the system, based on the new 
information which becomes available while accumulating new 
unlabelled data. We compare the proposed algorithm, with 
the Diffusion Distance Search (DDS) method, which is a 
diffusion-based algorithm. The discussion is supported by an 
experimental study based on simulated data. 

The paper is organized as follows. In Section [II| we 
formulate the problem in a general noisy and reverberant 
environment. We motivate the choice of the RTF for form¬ 
ing a feature vector and describe how it can be estimated 
based on the microphone measurements. In Section [HI] we 



3 


discuss the existence of an acoustic manifold and formulate 
an optimization problem which relies on a data-driven model 
computed based on both labelled and unlabelled data. This 
formulation leads to the MRL algorithm which is sequentially 
adapted by the unlabelled data accumulated during runtime. 
We briefly describe our previous localization method based on 
the diffusion framework 02) in Section [Tv| Accordingly, we 
describe the derivation of the DDS algorithm which conducts a 
neighbours’ search using the diffusion distance as an affinity 
measurement between RTFs. In Section [V| we demonstrate 
the algorithms’ performance by an extensive simulation study. 
A comparison between the MRL and the DDS algorithms is 
carried out in Section [Vl| Section |VII| concludes the paper. 


II. Problem Formulation 

We consider a standard enclosure, e.g., a conference room 
or a car interior, with moderate reverberation time. A single 
source located at p = \p x ,Py,Pz] T generates an unknown 
speech signal s(n), which is received by a pair of micro¬ 
phones. The received signals, denoted by x(n) and y(n ), are 
contaminated by an additive stationary noise, and are given 
by: 


x(n) = ai(n, p) * s(n) + u\(n) (1) 

y{n) = a 2 (n, p) * s(n) + u 2 (n) (2) 

where n is the time index, a^(n, p), i = {1,2} are the 
corresponding AIRs relating the source at position p and each 
of the microphones and i^(n), i = {1,2} are uncorrelated 
WGN signals. Linear convolution is denoted by *. Each of 
the AIRs is composed of the direct path between the source 
and the microphone, as well as reflections from the surfaces 
characterizing the enclosure. Consequently, even in moderate 
reverberation conditions, the AIR is typically modelled as a 
long FIR filter. 

The purpose is to localize the speaker based on the current 
received microphone signals x{n) and y(n). We assume 
that we are also given a set of prerecorded representative 
samples from the region of interest. The training set is com¬ 
posed of N samples of measured signals {xi(n) : yi(n)}fL 1 
from various positions within the specified region. Only l 
samples among the set are labelled, i.e., their originating 
position pi is known. The rest u = N — l samples are 
unlabelled, namely, their corresponding source locations are 
unknown. To summarize, the training set is composed of l la¬ 
belled examples {^(n), yi,pi}\ =1 and u unlabelled examples 

{xi{n),yi}^ =l+v 

We are interested in a realistic scenario, where the amount 
of labelled data is significantly smaller than the amount of 
unlabelled data which can be collected online. Our goal is to 
build an on-line system which is initially given a small amount 
of labelled data, and is gradually adapted as new unlabelled 
samples are acquired. 

The first step is to define an appropriate feature vector that 
faithfully represents the characteristics of the acoustic path and 
is invariant to the other factors, i.e., the stationary noise and 
the varying speech signals. An equivalent representation of 0 


is given by EH: 

y(n) = h(n , p) * x(n) + v(ri) 

v(n) = u 2 (n ) — h(n) * u\{n) (3) 

where /i(n, p) is the relative impulse response between the 
microphones with respect to the source, satisfying a 2 (n, p) = 
/i(n, p) * ai(n, p). In 0. the relative impulse response repre¬ 
sents the system relating the measured signal x(n) as an input 
and the measured signal y(n) as an output. 

For convenience, we represent ([3]) in the frequency domain. 
The Fourier transform of the relative impulse response, termed 
the RTF, is obtained by: 


H(k, P ) = 


Syx (^5 P) 




S ss {k)A 2 {k , p)A*(k, p) A 2 (k, p) 


S' ss (fc)|Ai(A:,p)| s 


Ai(k,p) 


/c = 0,...,D-l 

(4) 


where H(k, p) is the RTF, S yx (k, p) is the cross power 
spectral density (CPSD) between y(n) and x(n), S xx (k, p) 
is the PSD of x(ri), S UlUl (k ) is the PSD of the noise in 
the first microphone ui(n), and S ss (k ) is the PSD of the 
source s(n). Ai(fc,p) and A 2 (k, p) are the ATFs of the 
respective AIRs, and k denotes a discrete frequency index. The 
choice of the value of D should balance the tradeoff between 
the correspondence with the relative impulse response length 
(large value) and latency considerations (small value). 

Since Ai(fc,p) and A 2 (k, p) are unavailable, we estimate 
the RTF by: 

ti(kp)s Ss(ML (5) 

S xx (k, p) 

Note that this estimator is biased since we neglect the PSD 
of the noise S UlUl (k). Alternatively, unbiased estimators can 
be used, such as the RTF estimator based on the non- 
stationarity of the speech signal ED However, we are not 
concerned with robust estimation of the RTF since we will 
show that the proposed method is insensitive to this type 
of errors. Accordingly, we define the feature vector h p = 
[iT(0, p),..., H(D — 1, p)] T as the concatenation of estimated 
RTF values in the D frequency bins. In practice, we discard 
high frequencies in which the ratio in 0 is meaningless due 
to weak speech components. For the sake of clarity, we omit 
the dependency on the position, and denote the RTF feature 
vector by h. 


III. Manifold Regularization for Localization 

Our goal is to recover the target function which transforms 
each RTF to its corresponding location, based on the training 
set comprised of both labelled and unlabelled samples. Finding 
such an inverse mapping is non-trivial due to the complex 
nonlinear relation between the high dimensional RTFs and 
the originating locations. To mitigate this problem we adopt 
the concepts of manifold regularization, introduced by Belkin 
et al. (22), (20), and present it in the light of the acoustic 
environment and, in particular, for the source localization 
problem at hand. It is important to note that, originally, the 
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concepts of manifold regularization were implemented for 
classification, whereas, here, it is applied to the problem of 
source localization which is a regression problem. 

Two guiding principles are in the core of the proposed 
method, that will be termed Manifold Regularization for Lo¬ 
calization (MRL). First, instead of using complex variational 
calculus for estimating the target function, we assume that the 
function resides in a reproducing kernel Hilbert space (RKHS). 
Due to the special characteristics of the functions belonging 
to the RKHS, the problem can be formulated simply as a 
system of linear equations. Second, we incorporate geomet¬ 
rical considerations, i.e., we use the information implied by 
the intrinsic patterns observed in the set of RTFs to build a 
data-driven model. Then, the solution is constrained to behave 
smoothly with respect to this data-driven model, representing 
the intrinsic structure of the RTFs. 


A. The Acoustic Manifold 

As mentioned in Section [II| the RTFs have a high di¬ 
mensional representation in that corresponds to the vast 
amount of reflections from the different surfaces characterizing 
the enclosure. We assume that the RTF samples, drawn from 
a specific region of interest in the enclosure, are not spread 
uniformly in the entire space of C D . Instead, they are confined 
to a compact manifold Ad of dimension d, which is much 
smaller compared to the dimension of the ambient space, i.e. 
d D. This assumption is justified by the fact that the RTFs 
are influenced by only a small set of parameters related to 
the physical characteristics of the environment, such as: the 
enclosure dimensions and shape, the surfaces’ materials and 
the positions of the microphones and the source. Moreover, we 
focus on a static configuration, in which the properties of the 
enclosure and the position of the microphones remain fixed. 
In such an acoustic environment, the only varying degree of 
freedom is the source location. Accordingly, we assume that 
the RTFs can be intrinsically embedded in a low dimensional 
manifold which is governed by the position of the source. The 
existence of such an acoustic manifold was discussed in detail 
in |[23lL and was demonstrated with respect to the DOA of 
the source. The main results will be briefly described in the 


experimental part, in Section V-B 


Roughly, we consider a manifold of reduced dimensions 
which may have a complex nonlinear structure. However, in 
small neighbourhoods the manifold is locally linear, meaning 
that in the vicinity of each point it is flat and coincides with 
the tangent plane to the manifold at that point. Hence, the 
Euclidean distance can faithfully measure affinities between 
points that resides close to each other on the manifold. For 
larger scales, the Euclidean distance is meaningless, and we 
should rather use the geodesic distance on the manifold. 
However, the geodesic distance can be evaluated only when 
the structure of the manifold is known. In order to respect the 
manifold structure we will only examine local connections 
between points and disregard larger distances. 


B. Background of Reproducing Kernel Hilbert Spaces 

Our goal is to find the inverse-mapping function that re¬ 
ceives an RTF sample and returns the corresponding source 


location. In general, estimating a function that minimizes a 
cost function, is a cumbersome task that requires complex 
mathematical tools, such as variational calculus. One simpli¬ 
fying approach is to assume that the target function belongs 
to a certain class of functions with a specific structure. For 
example, it can be assumed that the target function belongs 
to a certain space of functions, spanned by an orthogonal 
basis. Hence, the target function can be represented by a 
linear combination of the basis functions, where the weights 
are determined according to the projections of the function 
on each of the basis functions. In our case we assume that 
the target function belongs to a reproducing kernel Hilbert 
space (RKHS) associated with a unique kernel function that 
evaluates each function in the space by an inner product. 
Rather than computing the basis functions spanning the space, 
we use an analogues representation with linear combinations 
of the kernel function. According to this representation, the 
problem can be converted to a simple linear estimation of a 
finite set of parameters. 

We will first represent the kernel function and its properties, 
and then define the RKHS and discuss its representation by the 
kernel function that will be used for deriving the optimization 
problem in Section III-C In Appendix [A] we show that the 
eigenfunctions associated with the kernel form an orthogonal 
basis for the RKHS, and discuss an analogue representation 
in terms of these basis functions. 

As implied by its name, an RKHS is associated with a kernel 
function k : Ad x Ad M that measures a pairwise affinity 
between RTFs. The kernel function must satisfy the following 
two conditions: 


1) Symmetry: fc(h^hj) = k(hj,hi) Vh^hj G Ad. 

2) Positive semi-definite: the n x n matrix K with Kij m 
k(h i: hj) is positive semi-definite, for any arbitrary 
finite set of points {h i }™ =1 G Ad. 

Another essential requirement from the kernel is that it 
defines a notion of locality, determined with accordance to 
a scaling factor £&: for ||h* — hj\\ < /c(h^hj) —>• 1, and 
for ||— hj\\ £*., k(hi, h j) 0. A common choice is to 
use a Gaussian kernel with variance 


fc(h,, hj 


) = exp 


{ 


iN-h.n 

2 Sk J 


( 6 ) 


Clearly, the Gaussian kernel is a symmetric positive semi- 
definite function, and satisfies the locality property. 

The locality property is of major importance in our case, 
since the kernel receives RTFs, sampled from the manifold 
Ad. As discussed above, the manifold is in general nonlinear 
and is assumed to be locally linear over small patches. Due 
to its property of locality, the kernel function constitutes an 
affinity measure that respects the manifold structure. 

An RKHS, denoted as TLk, is a Hilbert space of functions, 
mapping each h G Ad to M, which is associated with a kernel 
k. We skip the formal definition of an RKHS (for details 
see EH, G3). Instead, we state the two main properties of 
an RKHS: 


• for all h G Ad, k h (-) G TLk 

• The reproducing property: for all / G TLj c and h G Ad, 

</(•)> fc h(-)> = /(h) 
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where for each h G M we define the real valued function 
fch(-) = fc(h, •). The first property simply states that the RKHS 
consists of all functions defined by the kernel k at some point 
on the manifold. The second property implies that the kernel 
k has a special property that it evaluates all the functions in 
the space by an inner product. For example, in Z 2 the delta 
function has the reproducing property since it evaluates all the 
functions in Z 2 : (£(h, •), f( m )) t2 = /(h). However, this does 
not define an RKHS, since the delta function does not belong 
to Z 2 . 

We have seen that an RKHS is associated with a unique 
reproducing kernel function. In the opposite direction, known 
as the Moore-Aronszajn theorem, every symmetric, positive 
definite kernel k defines a unique RKHS Hk that is given by 
the completion (an expansion that includes the limits of all 
Cauchy sequences) of the space of functions spanned by the 
set {fchi(-)} : 

{/I/O = ^2aik hi (-);i £ IN,a* G K,,hj e M} (7) 

i 

with respect to the following inner product: 

(/(•)>ff(-)> = (^2 a i k hi(-),Yl b i k h ;Oy ( 8 ) 

= dibjk(hi, hj). 

h3 

It can be easily verified that the two mentioned properties 
of an RKHS are satisfied by this definition. Obviously, the 
reproducing kernel belongs to the space, and the reproducing 
property holds, since: 

(/o,fch(-)) = ^E a ^(-)^hO^ (9) 

= = /(h)- 

i 

Additional view of an RKHS, based on Mercer’s theo¬ 
rem [26], is discussed in Appendix [A] According to this 
view point, any function / G Hk can be represented by an 
orthogonal basis of functions {'fi(’)} related to the kernel k: 

'Hk = {/!/(•) = EA^G) and \\f\\u k < oo}. (10) 

i 

To circumvent the computation of the basis functions, we use 
the representation of ([7]), in terms of the kernel function. 


C. Optimization and Manifold Regularization 

In this section we present the optimization over the target 
function assuming that it belongs to an RKHS Hk with a 
reproducing kernel k. Formally, we search for a function 
f c : C D M c G {x, ip zj which is the inverse mapping be¬ 
tween an RTF and its corresponding position, i.e. f c ( h) = p c . 
In this paper we focus on estimating one position coordinate, 
thus, we omit the coordinate subscript. However, the analysis, 
the results and the algorithm described here can be naturally 
extended to estimating several coordinates. 


The search will be formulated by the following optimization 
problem: 


1 1 

/* = argmin-E 1/ (/( h i)>Zi)+7fcll/llw fc +7 m||/||^, 

feu k l fr? 

(11) 

where || • \\\ ik is the RKHS norm that corresponds to the inner 
product defined in ([9]), || • \\ 2 M is the intrinsic norm defined with 
respect to the manifold Ml, and 7/c, 7 m are scalar parameters. 
The optimization problem consists of three components. The 
first term is an empirical cost function defined over the labelled 
samples {h^}- =1 . The function V evaluates the extent of 
correspondence between the evaluations of the target function 
/( hi) and the true labels pi. In our case, we set the cost 
function to be the squared loss function (pi — f (hi)) 2 . Note 
that while the Z 2 norm is not suitable for comparing between 
RTFs m, it is a reasonable choice for evaluating localization 
quality. 

The two last terms in GD are regularization conditions. 
Roughly, their role is to prevent the solution from overfitting 
to the labelled examples. The second term is the Tikhonov 
regularization which penalizes the RKHS norm of the function 
to impose smoothness condition in Hk - The additional regular¬ 
ization term, defined by the last term in fTT] ), was introduced 
by Belkin et al. [20]. This is an intrinsic regularization that 
represents a smoothness penalty of the function with respect 
to the manifold Ml. 

One natural choice for the intrinsic norm is to measure the 
gradient of the function along the manifold, i.e., to measure 
the variability of the function with respect to small movements 
on the manifold. Since the manifold structure is unknown, this 
term should be approximated on the basis of both labelled and 
unlabelled samples. The training set {h}^ 1 , which includes 
different realizations of possible acoustic paths, can be viewed 
as a discrete sampling of the manifold Ml. The manifold can 
be empirically represented by a graph in which the training 
samples are the graph nodes, and the weights of the edges are 
defined according to an TV x N adjacency matrix W between 
the samples: 



l|h,-h 7 -|| 2 j 

J 


if h j G Mi or h; G Mj 
otherwise 

( 12 ) 


where Mj is a set consisting of the d nearest-neighbours of 
h j among {h,}7- 


The adjacency matrix W is used to form the graph Lapla- 
cain L, by L = D — W, where D is a diagonal matrix with 
D a = XljLi W ij. It can be shown, under certain conditions, 
that the graph Laplacian L converges to a differential operator 
on the manifold Mt, as was discussed in detail in (27), (28), 
|29j. Hence, the gradient of the function along the manifold 
can be approximated using the graph Laplacian. Accordingly, 
an intrinsic measure of data-dependent smoothness is given 
b y : UWm = fTLf > where f = [/(&i),/(hjv)] • Thus, the 
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optimization problem CD) can be recast as: 

1 1 

f* = argmin j^iPi ~ /(h *)) 2 + lk\\f\\u K + 7 Mf T Lf. 

fen K 

(13) 

Further insight can be obtained by the expansion of the 
intrinsic regularization: 

N 

f T Lf = £ /(hi)^i/(hi) 

*,J = 1 

N / N \ N 

= E E ^ / 2 ( & o- E 

i=1 \j =1 / 

^3 

N N 

= E w 2 (hi) - E WiifihiMhj) 

i,j =1 i,j=l 

1 ^ 

= 2 E ^(/(^-/(hi )) 2 (14) 

i,i=l 

Intuitively, in ff4] ), large Wij, corresponding to strong simi¬ 
larity between and h j, implies a tendency of /(h^) and 
/(hj) to be close to each other. For this reason, a truncated 
kernel was chosen in since it is reasonable to penalize 
the function only when the corresponding RTFs resides in the 
same local neighbourhood. 

Note that is a semi-supervised formulation, since it 
involves both labelled and unlabelled samples. While the first 
term is merely based on the labelled samples, the last two 
terms are based on both labelled and unlabelled data. The 
two regularization parameters jk and 7 m balance between 
maximizing the correspondence to the labelled data, and main¬ 
taining low-complexity of possible solutions. In some respects, 
both regularization terms try to relate the target function to 
the manifold M by the two different kernels defined in © 
and (Tty . Involving two kernels associated with different scales 
represents two different measurements of smoothness with 
respect to the manifold. Since the real structure of the manifold 
is unknown, the combination of both kernels is essential for 
obtaining a more accurate modelling of the manifold. 

The Representer theorem lf30l states that the minimizer /* 
of (Tty is a linear combination of the kernel functions only 
in the set of labelled and unlabelled points i.e., it is 

given by: 

N 

r(h) = y]a i fc(h i ,h) (is) 

i= 1 

where {af} are the interpolation weights. In Appendix |b| we 
provide the proof of the theorem [ 20 ], which is derived by 
a simple orthogonality argument, and relies on the specific 
structure of the functions in Hk implied by together 
with the reproducing property that uniquely characterizes the 
RKHS. The Representer theorem dramatically simplifies the 
regularized optimization problem of (Tty so it can be formu¬ 
lated as a linear optimization over a finite set of parameters 
{ a i}- 


D. Derivation of the Localization Algorithm 

In the previous section we formulated an optimization 
problem with manifold regularization for recovering the target 
function / in (Tty . Based on the Representer theorem stated 
in the optimization boils down to estimating the interpo¬ 
lation weights {ai}. Substituting m in © yields a second- 
order polynomial objective function of a = [ai,..., ajsr] T '- 

a* = argmin 7 (q — JKa) T (q — JKa) 

aeR N l ' 

+ 7 /ea T Ka + 7 M a T KLKa (16) 

where K is the N x N Gram matrix of k defined by 
Kij = k(hi,hj); I/v is the N x N identity matrix; J is a 
N x N diagonal matrix: J = diag(l,..., 1, 0,..., 0) with l ones 
and u zeros on its diagonal (functions as an indicator for the 
labelled samples in the set); and q = [pi, 0,..., 0] T is a 
label vector comprising the l known positions of the labelled 
samples with qi = 0, for al \i> l. Differentiating with respect 
to a and comparing to zero, yields: 

y (q - JKa) T (—JK) + ( 7fe K + 7 mKLK) a = 0 (17) 

By rearranging ( p^7| ), we obtain the following linear system: 

[JK + Fyfcljv + ^ 7 mLK] a = q. (18) 

Accordingly, the interpolation weights a are given by: 

a* = [JK + h k I N + ^mLK]" 1 q. (19) 

Thus far, the computations were carried out offline based 
only on the training set, composed of both labelled and 
unlabelled samples. The input to the algorithm is a new pair of 
measurements {x(n),y(n)}, generated by an unknown source 
from an unknown location on the manifold. The corresponding 
feature vector h is estimated according to 0. The kernel 
between the new sample h and each of the training samples 
{Mf =1 , is evaluated. The position of the new measurement is 
estimated according to (Tty by a weighted sum of these kernel 
evaluations multiplied by the weights given by 

N 

P = /(h) = E a i fc (^’ h ) (20) 

i=1 

E. Adaptive Manifold Regularization for Localization 

In this section we summarize the algorithm and formulate 
it in a dual-stage structure. We will take advantage of the 
fact that the optimization is derived in a semi-supervised 
manner, and propose an adaptive version. The algorithm is 
composed of two main parts: system adaptation and local¬ 
ization. In the adaptation stage, the interpolation weights a* 
are computed according to © based on the labelled and 
unlabelled samples, which were collected up to this point 
in time. In the localization stage, we receive a new pair of 
measurements {x(ri),y(n)} of an unknown source from an 
unknown location, and estimate the corresponding position 
based on the weights computed in the previous stage. The 
system is initialized with a small amount of labelled data, and 


7 


after several iterations of the localization stage, the new unla¬ 
belled samples received during runtime, are utilized for system 
adaptation. Note that the adaptation process can potentially 
adjust to changes in the environmental conditions. However, 
this attribute was not examined in the current paper that 
focuses on static configurations. Examining dynamic scenarios 
with changing environmental conditions is left for future work. 

The proposed MRL algorithm is summarized in Algorithm [T] 
and is illustrated in a flow diagram in Fig. [T] The flow 
diagram emphasizes the duality between the two parts of the 
algorithm and the interaction between them. In the downward 
direction, the model of the system derived in the adaptation 
part is utilized for localization. In the upward direction, the 
new unlabelled samples acquired in the localization stage, 
are propagated and utilized for system adaptation. Moreover, 
note that the two rightmost (blue) blocks are semi-supervised 
whereas the rest of the blocks are unsupervised. 


IV. Review of Localization Based on Diffusion 
Mapping 

In this section we briefly review a method for semi- 
supervised localization that was presented in rm tws 
method, that will be termed DDS, is a dual stage approach 
based on the concepts of diffusion maps ED, 10. In the first 
stage we recover the mapping between the original space C D 
and the embedded space R d which is governed by the con¬ 
trolling parameter, i.e. the position of the source. The second 
step is performing the localization by searching the neighbours 
of the new point among the training set in the new recovered 
space. Note that both the MRL and DDS algorithms rely on the 
information implied by the manifold M. Nevertheless, there 
are several fundamental aspects that distinguish between the 
two, as will be elaborated in Section [VI] 


It should be emphasized that we do not present an update 
mechanism, but instead the weights are computed from scratch 
in each adaptation iteration. The development of a recursive 
version of the algorithm is left for future work. 

The number of localization iterations between two succes¬ 
sive adaptations is chosen empirically to obtain satisfactory 
performance. Note that if we choose a small value, increasing 
computational complexity, we will not gain much performance 
improvement. Adding only a small amount of unlabelled 
information do not change the weights significantly. 


Algorithm 1: Manifold Regularization for Localization 
System Adaptation: 

Input : N = l + u training points: l labelled samples 
{xi(n),yi(n),pi} l isa £ and u unlabelled samples 
{xi(n),yiln)}g l+1 
Output: Interpolation weights a* 

1) For each point estimate the corresponding RTF 
according to <0- 

2) Construct the reproducing kernel matrix K and the 

adjacency matrix W, according to ([6]) and ( fl2| ) 
respectively, based on {h^} . 

3) Compute the expansion weights a* according to ( p~9] >. where (p^ denotes the ith entry of the vector Usually, <p Q 

is ignored since it is equal to a column vector of ones. 

In the localization stage, the embedding should be extended, 
given a new RTF sample h, corresponding to a new pair 
of measurements {x(ri),y(ri)} produced by unknown source 
from unknown location. Further spectral decomposition is 
unnecessary according to Nystrom extension. The new spectral 
coordinates are obtained by: 

<p*j = /b T <p j j e {1,..., d} (22) 

where b is an affinity vector between the training set and the 
new test point: 

to System Adaptation and add the new unlabelled samples. 


Localization: 

Input : A new pair of measurements {x(ri),y(ri)} 
produced by an unknown source from an unknown 
location 

Output: Estimated position p 

1) Estimate the corresponding RTF h according to 0. 

2) Compute the affinity between h and each of 
{h^}._ 1 , using the reproducing kernel. 

3) Estimate the new point location using the estimated 
interpolation weights: p = /(h) = J2iLi &*&(hi, h) . 

After a several number of newly acquired samples, return 


A. Parametrization of the Manifold 


In the previous section we introduced a discrete repre¬ 
sentation of the manifold by a graph in which the training 
samples are the graph nodes, and the weights of the edges 
are defined according to the adjacency matrix W of §Y2\ . 
The adjacency graph is normalized to obtain the transition 
matrix P = D -1 W, which defines a Markov process on the 
graph. Accordingly, p(h^, h j) = represents the probability 
of transition in a single Markov step from node to node 
hj. 

A nonlinear mapping of the samples into a new embedded 
space is obtained by spectral decomposition of the transition 
matrix P. The embedding is based on a parametrization of 
the manifold A4, which forms an intrinsic representation 
of the data. We apply singular value decomposition to the 
transition matrix P, and pick the d principal right-singular 
vectors {cpj } d =1 that corresponds to the d largest singular 
values {Xj}j =v The d principal right-singular vectors forms 
the diffusion mapping of the samples into an Euclidean space 
R d , defined by: 


d • lu 1 ^ 


\ (*) 

Ai^i , 


■ > x dVd 




1 T 


( 21 ) 


h|| 2 ) 


£b 


bi = exp 


(23) 











Fig. 1: Flow diagram of the proposed MRL algorithm. The algorithm consists of two parts: system adaptation and 
localization. In the adaptation part, both labelled and unlabelled samples are utilized to build a data-driven model for the 
RTFs and relate it to the position of the source. In the localization part, the position of a new pair of measurements is 
estimated based on the model learnt in the adaptation stage. The newly acquired unlabelled samples in the localization stage, 

are propagated and utilized for system adaptation. 


B. Nearest Neighbour Search on the Manifold 

In Section IIII-AI we described the structure of the acoustic 
manifold M of the RTFs. We stated that in order to properly 
measure affinities between RTFs, we should use the geodesic 
distance, which is the shortest path on the manifold. An 
approximation of the geodesic distance is given by diffusion 
distance, defined as: 

= lb (hi,-) -p(hjv) \\% 0 

N 

= X](p(hi,h r ) - p(hj,h r )) 2 /(f)^ 

r=l 

where </> 0 is the most dominant left-singular vector of P . 

The diffusion distance incorporates information of the entire 
set to determine the connectivity between pairs of samples on 
the graph. Pairs of points who are closely related to the same 
subset of points in the graph, are considered close to each other 
and visa versa. It can be shown that the diffusion distance is 
equal to the Euclidean distance in the diffusion maps space 
when using all N eigenvectors. This equivalence emphasizes 
the virtue of the diffusion mapping as it indicates that the 
mapping preserves the affinity between points with respect to 
the manifold. The diffusion distance can be well approximated 
by only the first d principal eigenvectors ED, i.e., 

^ ||* d (hi) - (24) 

Equipped with the ability to measure distances along the 
manifold using the diffusion distance, we are able to properly 
quantify the affinities between RTFs samples. Samples which 
resides next to each other on the manifold, are assumed 
to be physically adjacent, i.e., they are likely to represent 


sources from close positions. Thus, the position of a new 
sample can be estimated by searching for its neighbours on 
the manifold. Accordingly, the estimate will be formulated as a 
weighted sum of the positions of the labelled samples, where 
the weights are proportional to the corresponding diffusion 
distance between the new sample and each of the labelled 
samples: 

i 

(hi) Pi (25) 

i=1 

where the weights 7 (hi) are given by: 

7 (hi) = exp{-ft«(h,h,)/ £ ,} (26) 

Ej=i ex P {—-0 Di ff (h, hj) /e 7 } 

The DDS procedure is summarized in Algorithm [2] 

Note that both labelled and unlabelled samples participate 
in the first stage, for the construction of the graph Laplacian. 
However, in the localization stage only the labelled samples 
are utilized because we rely on the labellings. Though both 
MRL and DDS algorithms have evident similarities, we show 
in the experimental part that the later is inferior due to its 
different utilization of unlabelled data. 

V. Experimental Results 

A. Setup 

We describe the simulated setup used for conducting the 
experimental study. We simulated a6x6.2x3m room, using 
an efficient implementation I33ll. of the image method Ell¬ 
in the room there are two microphones located at (3, 3,1) m 
and (3.2,3,1) m, respectively. The source is known to be 
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Algorithm 2: DDS 

Diffusion Mapping: 

Input : N = l + u training points: l labelled samples 
{xi (ra), fji (n), pi }- =1 and u unlabelled samples 
{xi{n),yi{n)}f =l+l 
Output: Embedding 

1) For each point estimate the corresponding RTF 
according to ffl- 

1 — 1 r — N 

2) Construct the graph W based on and form 

the transition matrix P. 

3) Employ singular value decomposition of P and 
obtain the singular-values {fij} and the right-singular 
vectors 

4) Construct the map &d according to ([21]) to obtain an 
embedding that represents the intrinsic structure of 
manifold A4. 

Eocalization : 

Input : A new pair of measurements {x(ri),y(ri)} 
produced by an unknown source from an unknown 
location 

Output: Estimated position p 

1) Estimate the corresponding RTF h according to 0- 

2) Apply Nystrom extension according to ( |22| ) to obtain 
the spectral coordinates of h. 

3) Compute the approximated diffusion distance 
between 3>d(h) and each of the labelled samples 
{^(h;)}^, according to ( |24| . 

4) Estimate the new point location by f25] > as a linear 
combination of the positions of the labelled samples 
according to distances in the diffusion mapped space. 


positioned at 2 m distance with respect to the first microphone, 
on the same latitude. The goal is to recover the azimuth 
angle of the source. The initial analysis and examination of 
algorithms is carried out assuming that the azimuth angle of 
the source is ranging between 10° -y 60°. Then, the algorithm 
performance is further demonstrated on a wider range of 
azimuth angles between 0° -E 180°. Fig. [2] illustrates the 
simulation setup. 

For each location, we simulate a unique 3 s speech signal, 
sampled at 16 kHz. The clean speech is convolved with the 
corresponding AIR and is contaminated by a WGN. This 
forms the measured signals in the two microphones. For each 
source location, the CPSD and the PSD are estimated with 
Welch’s method with 0.128 s windows and 75% overlap and 
are utilized for estimating the RTF in 0 for D = 2048 
frequency bins. 

B. Analysis of the Manifold 

In this section we review the main results presented in lf23l . 
We investigate the acoustic manifold of the RTFs and examine 
the proper distance between them that maintains physical adja¬ 
cency. The analysis is carried out using a set of TV = 400 RTF 
samples, corresponding to 400 positions distributed uniformly 
in the specified range. Two alternative distance measures for 



Fig. 2: An illustration of room setup. The purple arc marks 
the region where the source is assumed to be positioned. The 
red dots define the grid of the labelled examples. 


quantifying the affinity between different RTFs, are addressed. 
We start with the Euclidean distance defined by: 


£>Euc(hi,h.,) = ||hj - hj 


(27) 


The Euclidean distance is compared with the diffusion distance 


presented in Section IV-B 


Fig. [3ja) depicts the Euclidean distance and the diffusion 
distance between each of the RTFs and a reference RTF 
corresponding to 10°, as a function of the angle. We used 
moderate reverberation time of 300 ms and 20 dB SNR. 
We observe that the monotonic behaviour of the Euclidean 
distance with respect to the angle is confined to approximately 
3.2° range. Consequently, we conclude that the Euclidean 
distance is meaningful only for small arcs. Thus, in general 
the Euclidean distance is not a good distance measure between 
RTFs. However it can be properly utilized when inserted into 
a Gaussian kernel in either the manifold regularization frame¬ 
work or the diffusion framework. According to its scaling 
parameter, the Gaussian kernel preserves small distances and 
suppresses large distances which are meaningless. The kernel 
scale should be adjusted to the distance at which monotonicity 
is maintained by the Euclidean distance, in order to preserve 
locality. 

For the diffusion distance, only the first element in the 
mapping (d = 1) was considered. This choice will be justified 
in the sequel. We can see that for almost the entire range, 
the diffusion distance remains monotonic with respect to the 
angle, indicating that it is an appropriate metric in terms of the 
source DOA. Further insight into the mapping itself, is gained 
by plotting the single-element mapping as depicted in 

Fig. [3jb). We observe that the mapping corresponds well with 
the angle up to a monotonic distortion. Thus, the diffusion 
mapping successfully reveals the latent variable, namely, the 
position of the source. The almost perfect matching between 
the first element of the mapping and the corresponding angle, 
justifies the use of d = 1 for estimating the diffusion distance. 

To summarize, the presented results strengthen the claim 
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(a) 



(b) 

Fig. 3: (a) The Euclidean distance and the diffusion distance 
between each of the RTFs and the RTF corresponding to 10°, 
as a function of the angle. The dashed line shows the boundary 
angle until which monotonicity is preserved for the Euclidean 
distance, (b) Single-element diffusion mapping 3>i(-). 


on the existence of a nonlinear acoustic manifold. In small 
neighbourhoods around each point, the manifold is approxi¬ 
mately flat, meaning that it resembles an Euclidean (linear) 
space. For larger scales the affinity between RTFs should be 
determined according to the geodesic distance on the mani¬ 
fold. The diffusion framework successfully reveals the latent 
variable controlling the acoustic manifold, and the diffusion 
distance properly reflects the distances on the manifold. These 
results motivate the involvement of manifold aspects in the 
localization process, as introduced by either the MRL or the 
DDS algorithms. 


C. Localization Results 

In this section we examine the ability of both DDS and 
MRL to recover the DOA of the source. The training set 
consists N = 400 representative samples distributed uniformly 
between 10° -i-60 o . Among the training set, only l = 6 samples 
were labelled, creating a grid with approximately 10° distance 
between adjacent labelled samples, as depicted in Fig. [2] The 
performance is examined on a set of T = 120 additional sam¬ 
ples produced by unknown sources from unknown locations, 
confined to the defined range. The performance is measured 
according to the root mean square error (RMSE), defined by: 


RMSE = 


N 


^ 5 ><- 


Pi | 


i=1 


(28) 


where p stands for the azimuth angle of the source. To prevent 
the results from being dependent on a specific reflection 
pattern of a certain room section, we repeated the simulation 
with rotations of the constellation described above. The rota¬ 
tion angle was generated uniformly between 0° -=- 360°. The 
positions of the second microphone, the training points and 
the test points were rotated by this angle, with respect to the 
first microphone. The RMSE was averaged over 50 rotations 
of the constellation. 

The results of the MRL and the DDS algorithms are com¬ 
pared with that obtained by the classical GCC algorithm 01 for 
both noisy and reverberant conditions. In the first scenario we 
examine the algorithms’ performance for different reverbera¬ 
tion times with fixed SNR of 20 dB. In the second scenario 
the reverberation time is set to 300 ms, and different noise 
levels are examined. The training set is generated with fixed 
SNR level of 10 dB. The RMSE of the three algorithms in 
both scenarios, are shown in Fig. |4ja) and (b), respectively. 

It can be seen in Fig. |4|a) that the GCC performs well 
for low reverberation. However, its performance deteriorates 
gradually as reverberation increases, and becomes inferior 
compared with the performance of both the DDS and the MRL 
algorithms. In high reverberation, the GCC is incapable of 
distinguishing between the direct arrival and the reflections. 
A misidentification of the direct path, results in a large 
estimation error. The proposed algorithms are more robust to 
reverberation, since the variations in the entire RTFs are taken 
in account. 

Similar behaviour is observed in Fig. |4jb) in which dif¬ 
ferent noise levels are examined. Here too, the GCC method 
behaves well only in high SNR conditions, and its performance 
significantly degrades as noise level increases. When the 
measurements are contaminated by a significant amount of 
noise, the correlation between the two measurements is also 
very noisy, and the GCC cannot correctly identify the peak 
corresponding to the direct path. On the contrary, the semi- 
supervised algorithms are much more robust with respect to 
the background noise, and most of the time obtain lower error. 
These type of algorithms can compensate for the information 
loss caused by the poor conditions, by capitalizing on the prior 
information inferred from the training samples. 

We also observe that the MRL approach exhibits better 
results compared with DDS method. The reason for the visible 
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Fig. 4: The RMSE of GCC, DDS and MRL (a) as a function 
of the reverberation time (SNR=20 dB), and (b) as a function 
of SNR (T 60 = 300 ms) 


gap between the RMSEs of the two algorithms is related to the 
different ways they utilize unlabelled data, and will be further 
elaborated in Section [VU 

Finally, we examine the iterative process of the MRL 
algorithm through the following sequential simulation. We 
used reverberation time of 500 ms and 20 dB SNR. This time 
we examined a wider range of angles between 0° -F 180°. 
The initial adaptation was based on only 19 labelled samples, 
creating a grid of 10° distance between adjacent labelled 
samples, as depicted in Fig. [2] We conducted 9 cycles of 
the sequential algorithm, each comprised of both stages of 
system adaptation and localization. In the localization stage, 
we estimated the angles of 90 new samples from unknown 
locations. The total RMSE of the all set was computed. In 


the following iteration, these 90 new samples were treated as 
additional unlabelled data, utilized for system adaptation. The 
results are summarized in Fig. [5] 
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Fig. 5: The RMSE of an iterative simulation of MRL for angles 
in the range 0° -F 180°, where 90 unlabelled points are added 
in each iteration. T 60 = 500 ms and SNR=20 dB 



In this figure we observe that the RMSE decreases as a 
function of the number of iterations, indicating that the unla¬ 
belled data has an important role in reducing the estimation 
error. However, after a considerable amount of unlabelled data 
is accumulated, the process stabilizes on a certain error, and 
additional samples are redundant. 

VI. Discussion 

In the previous section we demonstrated the robustness of 
the MRL and the DDS algorithms to noisy and reverberant 
conditions. We have also seen that the performance of the 
DDS method is inferior with respect to that of the MRL 
algorithm. In this section we discuss the interfacing points 
of both algorithms, on the one hand, and highlight the major 
differences between them, on the other hand. 

To investigate the role of the unlabelled data in the MRL 
method, we inspect the expansion weights a* derived by the 
algorithm, as depicted in Fig. [6] The blue line corresponds 
to the weights of u = 441 unlabelled examples, while the 
red x-marks corresponds to the weights of l = 19 labelled 
examples. We observe a monotonic, almost linear, behaviour 
of the coefficients with respect to the angle. The obtained 
behaviour of the MRL coefficients, resembles the monotonic 
relation between the single-element diffusion mapping 
and the corresponding angle, depicted in Fig. [3jb). The cor¬ 
respondence between the two algorithms, suggests that they 
share similar aspects which lead to a parametrization of the 
manifold and recovery of the DOA of the source. 

However, we have seen that the MRL is a better localizer 
compared with the DDS. The difference between the two, is 
attributed to their different utilization of the unlabelled data. 
In the DDS algorithm, the unlabelled data are used only in 
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the learning phase, and the estimation merely comprises the 
positions of the labelled samples. In contrast, in MRL the 
unlabelled data do not only take part in the recovery of the 
manifold, but also participate in the estimation itself, involving 
both labelled and unlabelled data ( p~5] ). Another advantage of 
MRL over DDS is that it is sequentially updated, hence, it is 
more suitable for on-line implementations. 



Fig. 6: The estimated expansion weights a* with respect to 
the corresponding angle. The blue line corresponds to the 
weights of u = 441 unlabelled examples, while the red x- 
marks corresponds to the weights of Z = 19 labelled examples. 


VII. Conclusions 

A novel approach for semi-supervised localization, based on 
state-of-the-art manifold learning techniques, was presented. 
A set of representative samples in a defined room section is 
utilized for learning the acoustic manifold of the RTFs and 
building a data-driven model. Equipped with this knowledge, 
we find the function relating the samples and the correspond¬ 
ing positions by solving a regularized optimization problem in 
an RKHS. Simulation results confirm the algorithm robustness 
in noisy and reverberant environments. 

Integrating between traditional signal processing techniques 
and novel machine learning tools may be the key for better 
addressing adverse conditions, such as high noise levels and 
reverberations, that are the main causes for performance degra¬ 
dation of classical localization approaches. The current results 
indicate that the manifold perspective exhibits an interesting 
insight into the general structure of the acoustic responses and 
offers better solutions for common signal processing problems. 

Appendix A 

We define the integral operator on functions, associated with 
the kernel k, by the following integral transform: 

[T k f] = j k( t, s)/(s)ds = g( t). (29) 


The eigenfunctions {f’if)} an d eigenvalues {} of the inte¬ 
gral operator satisfy: 

[T k ipi}= Jk(t,s)Ms)ds = Wi{t). (30) 

According to Mercer’s theorem, the kernel k can be expanded 
by: 

k(t, s) = X i ipi(t)'ipi(s) (31) 

i 

where the convergence is absolute and uniform. The eigen¬ 
functions {form an orthogonal set and the RKHS can 
be defined as the space of functions spanned by this set: 

kLk = {/!/(•) = J2 a iM~) and \\f\\n> < oo} (32) 

i 

where the RKHS norm is defined by the inner product: 

(f,g) = = 53 (33) 

The reproducing property holds in this representation, since: 

(. f(-),k h (•)) = 

\ i j 

= EE“ iA A'( h ) (V’iOXV’jG)) ( = } 53 a ^( h ) = /( h ) 

i j i 

(34) 


Appendix B 

Theorem 1. The minimizer of the optimization problem (13] ) 
admits an expansion in terms of labelled and unlabelled 
examples: 

N 

/*(h) = 5>*(h,,h) (35) 

Proof: Any function / G TLk can be uniquely decom¬ 
posed into 2 components, which one is lying in the linear 
subspace spanned by the kernel functions in the training 
examples /y = span •), i m 1,..., N } and the other 

is lying in the orthogonal complement f±: 

N 

f = f\\ + ft = V Cbikjhi, h) + f± (36) 

i= 1 

where (f±,k(hj, •)) = 0 for all 1 < j < N. 

The above orthogonal decomposition and the reproducing 
property together, show that the evaluation of / on any training 
point hj, 1 < j < N is independent of the orthogonal 
component f±: 

/(D - </(•). k (hj,(-))) (37) 

N 

53 0 + f±, k{hj, (•)) 

i= 1 

N \ N 

t= 1 / i=l 
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Consequently, the value of the empirical terms involving the 
loss function and the intrinsic norm in the optimization prob¬ 
lem (the first and the third terms, respectively), are independent 
of f±. For the second term (the norm of / in Hp), since f± 
is orthogonal to YliLi ^(h^, •) and only increases the norm 
of / in Hk, we have 


N 


k = | y^aifc(hi,h) + f± 

i=1 

N 


2 

n k 


(38) 


i= 1 

N 

> | y^Qjfc(hi,h) 

i =1 


TL k 

2 

n k 


+ 


f± 


Ti k 


Therefore setting f± = 0 does not affect the first and the third 
terms of ( [13] ), while it strictly decreases the second term. It 
follows that any minimizer /* of © must have f± = 0, and 
therefore admits a representation: /*( h) = J2iLi ctikfhi, h). 
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